APPROXIMATE MAXIMIZERS OF INTRICACY FUNCTIONALS 



J. BUZZI AND L. ZAMBOTTI 

Abstract. G. Edelman, O. Sporns, and G. Tononi introduced in theoretical 
biology the neural complexity of a family of random variables. This functional is a 
special case of intricacy, i.e., an average of the mutual information of subsystems 
whose weights have good mathematical properties. Moreover, its maximum value 
grows at a definite speed with the size of the system. 

In this work, we compute exactly this speed of growth by building " approximate 
maximizers" subject to an entropy condition. These approximate maximizers work 
simultaneously for all intricacies. We also establish some properties of arbitrary 
approximate maximizers, in particular the existence of a threshold in the size 
of subsystems of approximate maximizers: most smaller subsystems are almost 
equidistributed, most larger subsystems determine the full system. 

The main ideas are a random construction of almost maximizers with a high 
statistical symmetry and the consideration of entropy profiles, i.e., the average en- 
tropies of sub-systems of a given size. The latter gives rise to interesting questions 
of probability and information theory. 



1. Introduction 

1.1. Neural Complexity, a measure of complexity from theoretical biology. 

In [16j, G. Edelman, O. Sporns and G. Tononi introduced the so called neural 
complexity of a family of random variables. It is defined as an average of mutual 
information between any subfamily and its complement, see below. It has been 
considered from a theoretical and experimental point of view by a number of authors, 

see e.g. [D II El El El [IDl [III [El [iSl [HI [B [HI [IE] . 

In order to define the neural complexity, we need to recall two classical definitions. 
If X is a random variable taking values in a finite space E, then its entropy is defined 
by 

H(X) ■.= -J2Pxix) log(Px(x)), Pxix) := P(X = x). 

Given two random variables defined over the same probability space, the mutual 
information between X and Y is 

MI(X, Y) := H(X) + H(r) - H(X, Y). 

We refer to the Appendix for a review of the main properties of the entropy and the 
mutual information. 

Edelman, Sporns and Tononi consider systems formed by a finite family X = 
(Xj)jg/ and define the following concept of complexity. For any S C I, they divide 
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the system into two subsystems: 

Xs:={X,,ieS), Xsc:=iX,,teS''), 

where S'^ := I\S. Then they compute the mutual information Ml{Xs,Xsc) and 
consider the sum 

AX) := ^^^^ MI(X5,X5c), (1.1) 

' ' 5c/ \\S\) 

where |/| denotes the cardinahty of /. Note that X(X) is really a function of the 
law of X. 

As shown in [2], one can define more general functionals 

r(X) :=5^4 Ml{Xs,Xs^), 
Sci 

which have similar properties, provided the properties of "exchangeability" and 
"weak additivity" still hold, see Sec. [2l The resulting functionals have been called 
intricacies in 

Using a super-additivity argument, we showed in [2j that the maximum value of 
any intricacy over systems with a given size grows linearly with the size. In this 
paper, we compute exactly this speed of growth by building "approximate maximiz- 
ers", i.e., families of an increasing number of random variables taking value in a 
fixed set and achieving, in the limit, the maximum intricacy per variable. Moreover, 
we shall construct in this paper a sequence of simultaneous approximate maximizers 
for all intricacies. 

Our construction is probabilistic in a fundamental way. We shall show that max- 
imizers should approximately satisfy strong symmetries (see Theorem II. 6p . that 
cannot be satisfied exactly (Lemma 13. 8p . We shall exhibit a random sequence of 
systems, which satisfy such symmetries in law, and approximately satisfy the same 
symmetries almost surely. 

If the family (Xj)jg/ is completely deterministic or, on the contrary, independent, 
then every mutual information vanishes and therefore X(X) = 0. As these examples 
suggest, large values of X require compromising between randomness and mutual de- 
pendence, i.e., to have non-trivial correlation between Xs and for many subsets 
S. This explains why maximizing this functional is not a trivial problem. 

1.2. Main Results. For the sake of simplicity, we state our results in this intro- 
duction only for the neural complexity (11.11) . deferring the analogous results for 
arbitrary intricacies to Section 5. 

First, we need some notations. The integers > 1 and d > 2 will denote respec- 
tively the cardinality of the family (Xj)jg/ and of the range of each Xj. Moreover, 

• A7v,d := {0, ■ . . ,d — 1}^ is the set of configurations, i.e., of possible values for 
the random vector X; 

• X{d,N) is the set of all Atv,^- valued random variables X, which we shall 
identify with Jli{d, N), set of all probability measures on A^v.^- 
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In particular, we write indifferently H(X) and H(/x), as well as and X(yu). Of 

course, entropy and intricacy are in fact functions of the law /i of X and not of the 
(random) values of X. 

Let us state our main results in the case of the neural complexity: 

Theorem 1.1. LetI{X) be the neural complexity (11.11) of Edelman-Sporns-Tononi. 
(1) We have for all fx E ^A{d, N), setting := 

^^^^ <x,il-x,)<]. (1.2) 



logd - ^' - 4 

(2) The maximum value of the intricacy at fixed size 



satisfies: 



I(d,N) := max X(X) = max X(u) 

XeX{d,N) ti&M{d,N) 



I{d,N) logd 



N^oo N 4 

(3) For any x G [0,1], there exists a sequence fi^ G Ai{d,N) approaching the 
upper bound of point (1), i.e., satisfying: 

V H(/i^) X(/.^) 

lim — - = X, lim — - = x(l — x). (1.6) 

Remark 1.2. We shall actually prove this theorem for arbitrary intricacies (see 
Theorem 15.11) . More precisely, and perhaps unexpectedly, we shall build, for each 
d > 2, a. sequence /x^ G Ai{d, N) satisfying, simultaneously for all intricacies X'^, 

hm = hm max , 

AT^oo N N^oo ^eM{d,N) N 

see Remark 15.51 below. 

Remark 1.3. All of the above is new, though numerical experiments by previous 
authors [16i Fig. 1] had suggested the concavity and the symmetry of the maximal 
intricacy given the entropy, but not its quadratic form. 

While the upper bound (11. 2p follows from direct computations, the existence 
of sequences {fi^)N satistying (11.31) is much less trivial and is the main result of 
this paper. As shown in Theorem 11.61 below, such sequences must exhibit a non- 
trivial behavior, combining a large amount of local independence and of non-trivial 
correlation on a global level. 

The existence of approximate x-maximizers, i.e., sequences /i^ G Ai{d,N) 
satisfying (11.31) . follows in our approach from a probabilistic construction: we shall 
prove that uniform distributions on appropriately chosen random sparse supports 
will have almost surely the desired properties: see Proposition 14.31 below. 

In the course of the proof, we also obtain rather detailed information on the 
structure of approximate x-maximizers. A key notion is the following one. 
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Definition 1.4. Given X G X{d,N), its entropy profile is the function hx '■ 
[0, 1] [0, 1] such that hx{0) = 0, 

^ / Vfe^ sc/, |5|=fc 

anc? hx is affine on each interval [^^, ^'\, k E I. 
Theorem 1.5. For x G [0, 1], let the ideal profile be 

hl{t) = X At = min{x, t}. 
Then for any sequence fix £ M.{d, N) of approximate x-maximizers, we have 
Wh^i^ - /i^llsup := sup \h^N{t) - hl{t)\ as N ^ oo. 

t£[0,l] 

In particular, for any sequence fi'^ G M.{d, N) of approximate maximizers, i.e., such 
that Urn x^oo^ifJ^^) /N = logd/A, we have: 



lim ' = 1/2, lim \\h^N - /it^llsup = 0. 



hoo 



Again, we prove in fact a version of this result for all intricacies, see Theorem 15.21 
below. 

If (/i^)Ar is a sequence of approximate x-maximizers and G X{d,N) has law 
/i^, we say that {X^)x is also a sequence of approximate x-maximizers. 

A corollary of the convergence of entropy profiles is the existence of a threshold 
in the behavior of typical subsystems of approximate x-maximizers: if 1 5*1 < xA^, 
then is almost uniform, which corresponds to local independence; if 15*1 > xA^, 
then the whole family X^ is almost a function of Xg , which corresponds to strong 
global correlation. Recall that H{Y \ Z) is the conditional entropy of Y given Z, see 
the Appendix below. 

Theorem 1.6. Let {X^)n be an approximate x-maximizer. Let y g]0,1[ and set 
kx '■= \_yN\. Consider the (^) sub-systems Xs of X^ of size kx = \S\. For all 
e > 0, if N is large enough, then except for at most of such subsets S, the 

following holds: 

y\(v ) 

• if y < x: Xs is almost uniform.- 1 — e < — — < 1; 

\S\ logo 

HiX'^lXs) 

• ifv>x: Xq almost determines the whole system 

XN. < ^ ^ < g. 

J^- ^ y - N\ogd - 

Again, we prove a more general version of this result in Theorem 15.61 below. 

1.3. Strategy of Proof and Organization of the paper. The main ideas of 
the proofs of the two theorems are a probabilistic construction of the sequence 
maximizers and the consideration of the entropy profiles hx defined above. As we 
indicated, in fact we analyze arbitrary intricacies generalizing neural complexity. 
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In section 2 we recall the notion of intricacy as a family of functionals over finite 
sets of discrete random variables satisfying exchangeability and weak additivity and 
give simple examples. In section 3 we give upper bounds on the intricacies of arbi- 
trary systems of given size and entropy. In section 4 we prove the main results by 
means of a probabilistic construction of random approximate maximizers. In section 
5 we collect our results for arbitrary intricacies. An Appendix contains basic facts 
from entropy theory for the convenience of the reader. 

1.4. Further questions. The bound x(l —x) in (11.31) is symmetric with respect to 
X = 1/2 and independent of c? > 2. We do not know whether these simple properties, 
which extend to arbitrary intricacies (see Theorem 15. II) . can be proved directly, e.g.: 
does there exist a duality operation in X{N,d) exchanging systems with entropy 
xN log d and {l — x)N log d while preserving their intricacy? Can one deduce from a 
system in X{d, N) with entropy H and intricacy / a system in X{d', N) with entropy 
(log 0?'/ log d) if and intricacy (logd'/ logrf)/? 

This work has focused on properties of systems with size tending to infinity. Notice 
that we know very little on the exact maximizers for fixed size beyond the constraints 
on their entropy contained in our main results. Because of the invariance properties 
of intricacy (see Lemma [2]8] and the following comment), exact maximizers are non- 
unique but we do not even know if there are only finitely many of them. 

Our construction of approximate maximizers is probabilistic. Could it be done 
deterministically? Would the corresponding algorithms possess a computational 
complexity related to the complexity that intricacies are supposed to describe? 

Our construction is global but could systems with maximum intricacy be built by 
a local approach, i.e., a "biologically reasonable" building process, using some type 
of local rules and/or evolution? That is, does there exist a "reasonable" self-map 
T : J^{d, N) ^ A4{d, N) such that the neural complexity of T'^{ij) converges to the 
maximum as n ^ oo for "many" ^ E M.{d, M). 

Our work also leads to interesting probabilistic constructions and questions in the 
theory of entropy and information. For instance: 

Problem. Describe the set of functions h : {0, . . . , A^} — >■ M obtained from picking 
X G X{d,N) and setting h{k) to be the average entropy of Xs where S ranges over 
the subsets o/ {1, . . . , A^} with cardinality k. 

Basic properties of the entropy (recalled in the Appendix) imply that h{{]) = 0, 
0<h{k + 1) - h{k) <\ogdfoT <k < N and 

h{k + h{k) < h{j +t)- h{j) ioi < j < k < k + i < N. 

However we shall show that not all such functions h arise from some X G X{d, N), 
see Lemma 13.81 See [S] for a closely related question. 
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2. Intricacy 

2.1. Definition. In this paper, a system is a finite collection (Xj)jg/ of random 
variables, each Xi taking value in the same finite set V . Without loss of generality, 
we assume that V = {0, . . . ,d—l} for alH G / and some d > 2 {d should be thought 
of as a convenient normalization) and / is a set of positive integers. We let Ai = 
Ud>2-^(^) ~ U(i>2 Ar>i -^('^' ^) corresponding laws, that is, the 
probability measures on {0,...,d-iy for each finite subset I c W := {1, 2, 3, . . . }. 

For S (Z I, we denote 

Xs:= (X„zg5). 

In [2], we defined the following family of functionals over such systems (more pre- 
cisely: over their laws) formalizing (and slightly generalizing) the neural complexity 
of Edelman-Sporns-Tononi [T6] : 

Definition 2.1. A system of coefficients is a collection of numbers 

c:=(4: JCCN*, SCI), 
i.e., I ranges over the finite subsets ofW, for all I and all S <Z I : 

Cs > 0, ^^c^ = 1, and c^c = Cg (2.1) 
sci 

where S"^ := I \ S. The corresponding mutual information functional is : 

^ R defined by: 

T{X) :=^4MI(X5,X5c). 

5c/ 

By convention, Ml {X(i,, Xj) = Ml {Xj, Xij,) = 0. An intricacy, is a mutual infor- 
mation function satisfying: 

(1) exchangeability (invariance by permutations): if I, J CC N* and (j) : 
I J is a bisection, then 2'^{X) = 1^(Y) for any X := {Xi)i(zi, Y := 

(2) weak additivity.- for any two independent sub-systems (Xi)jg/, (Y^)jgj (de- 
fined on the same probability space): I'^{X,Y) =1'^{X) +1'^(Y). 

X'^ is non-null if some coefficient with S ^ {0, /} is not zero. 

2.2. Classification of intricacies. In section 3 of |2] the following has been proved 

Proposition 2.2. A mutual information functional X'^ determines its coefficients 
uniquely and the following equivalences hold: 

• X" is exchangeable if and only if Cg depends only on \I\ and \S\; 

• an exchangeable X'^ is weakly additive if and only if there exists a random 
variable Wc ^ [0, 1] such that Wc and 1 — Wc have the same law and 

= E ((1 - W.f-'' W,^) = I x\l- xf~^ Xc{dx), (2.2) 

J[o,i] 

where Ac is the law of Wc. 

• an exchangeable weakly additive X^ is non-null Zj^Ac(]0, 1[) > 0, in which 
case all coefficients c\ are non-zero. 
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In this paper we consider only non-null intricacies X^. 

Example 2.3. The intricacy X of Edelman-Sporns-Tononi is defined by the coeffi- 
cients: 

1 1 



(2.3) 



and it is easy to see that in this case (12.21) holds with Wc a uniform variable over 
[0,1], see Lemma 3.8 in [2|. For < p < 1, the symmetric p-intricacy is 
defined by 



^ 2 



-(pl^l(l-p)lA5| + (i_p)I5|plA5|) 



and in this case is uniform on {p, 1 — p}. For "p = 1/2, this yields the uniform 
intricacy X^{X) with: 

and Wc = 1/2 almost surely. All these functionals are clearly non-null and exchange- 
able. 

Remark 2.4. The global 1/(|/| + 1) factor in (12.31) is not present in [16], which 
did not compare systems of different sizes. However it is necessary in order to have 
weak additivity. 

2.3. Simple examples. Let Xi take values in {0, . . . ,d — 1} for all i ^ I, a finite 
subset of N*. 

Example 2.5. If the variables Xj are independent then each mutual information is 
zero and therefore: T^(X) = 0. □ 

Example 2.6. If each Xi is a.s. equal to a constant q in {0, . . . ,d — 1}, then, for 
any S ^ 0, }i{Xs) = 0. Hence, X%X) =0. □ 

Example 2.7. If Xi is uniform on {0, . . . , — 1} and Xi = Xi for all i & I, then, 
for any S ^ 0, ll{Xs) = hgd and, if additionally S" ^ 0, H(X5 | Xgc) = so that 
each mutual information M^X^; Xgc) is logc?. Hence, 

X%X)= 4-logd={l-ci-c'j)logd<\ogd. □ 

sci\{^),i} 

Examples 12.51 and 12.61 correspond to, respectively, maximal and minimal total 
entropy. In these extreme cases = 0. Example 12.71 has positive total entropy and 
intricacy (if all Cg are non zero). However, the values of the intricacy grow very 
slowly with |/| in these examples: they stay bounded. We shall see however how to 
build systems (Xj)jg/ G X{d,I) which realize much larger values of X'^, namely of 
the order of 1/1. 
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2.4. Invariance properties of intricacies. We have the following obvious invari- 
ances of intricacies. 

Lemma 2.8. The intricacies are invariant under the following group actions on 
X{d,N) for some N,d>l: 

(1) the group of permutations on {1, ... , N} acting on X{d, N) by: (crX)j = 
Xa-^{i), yi = l,...,N. 

(2) the Nth power (Sd)^ of the permutation group on {0, . . . ,d — 1} acting on 
X{d, N) by: (aX), = a, o X„ W z = 1, . . . , N . 

In particular, for N,d > 2, the maximum of I'^ over X G X{d,N) cannot be 
achieved at a single probability measure on 

^d,N = {0,...,d- 1}^. Indeed, if it 
were the case, then this measure would be invariant under the group action (2) 
above. However, this action is transitive on A^^ ^r. Therefore the measure would be 
equidistributed on this set. Hence the maximizer would be a family of independent 
variables, for which the intricacies are all zero. This is a contradiction whenever 
N,d>2. 

3. Upper bounds on intricacies 

In [2], it was proved that 1^{X) < N\ogd/2 ii X e X(d,N). By comparison with 
"ideal entropy profiles" defined below, we prove sharper upper bounds for systems 
with given size and entropy. 

3.1. Definitions. We define the ideal entropy profile and the corresponding intri- 
cacy values both for finite size and in the limit N ^ oo. We also introduce an 
adapted norm to measure the distance between profiles. 

Let I'^ be some intricacy. It is convenient to use the following probabilistic rep- 
resentation of the coefficients c based on the the random variable Wc with law Ac 
defined by (12.21) . Let {Yi)i>i be a sequence of i.i.d. uniform random variables on 
[0, 1] and let 

^ D 
D^:=J2Mn<w.), (^n:=^, N > 1. (3.1) 

k=l 

Conditionally on Wc, is a. binomial variable with parameters {N,Wc). In par- 
ticular, for all ^ : N M, by 

E{g{D,,))= [ f;(^')x'=(l-x)^-'=^7(A:)A,(rfx)=f^cf (^)^(A:), (3.2) 

"'[0,1] fc=o ^ ^ k=0 ^ ^ 

and therefore, for all bounded Borel / : [0, 1] t-^ M 

E(/(,3«)) = f: (3.3) 
We recall the Definition 11.41 of the entropy profile of X G X(d, N): hx{0) = 0, 
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and hx is affine on each interval [^]v^,;^], k E I. We can now define the ideal 
profiles and their intricacies. 

Definition 3.1. For x G [0, 1] and N > 1, the ideal entropy profile is 

hl{t) ■.= tAx = mm{t,x} (3.4) 
and the corresponding (normalized) intricacies are, for finite N: 

N 

l'}q{X) 

and, for N ^ oo: 



■.= 2j2ck (, ) hl{k/N)-x = 2E{xAPN)-x (3.5) 

i—n V / 



z'(x) := 2 / (tAx) \c{dt) - s = 2 E(x A PFJ - x. (3.6) 
Jo 

We remark that the ideal profile h*j. does not depend on the intricacy I^. Finally, 
we define a family of norms. For all bounded Borel / : [0, 1] i— > M, let 

11/11.,^ := r\ \f{k/N)\ =M\f{M\) ■ (3.T) 

fe=o ^ ^ 

Remark 3.2. For the particular cases of Example 12.31 we have more explicit ex- 
pressions. For the Edelman-Sporns-Tononi neural complexity, the above reduces 
to 

i{x) = x(l — x), X G [0, 1], 

for the uniform intricacy 

i^ (x) = min{x, 1 — x}, x G [0, 1], 

and for the symmetric p-intricacy 

i^{x) = mm{x,l — x,p,l — p}, xG[0, 1]. 

3.2. Upper bounds and distance from the ideal profile. In this section we 
prove the following upper bounds 

Proposition 3.3. Let X'^ be an intricacy. 

(1) i'^ : [0,1] ^ [0, 1] is a concave function admitting the Lipschitz constant 1 and 
symmetric about 1/2: i'^{l — x) = i'^{x). Moreover, i'^{l/2) = maXa,g[o_i] 'i'^(x). 

(2) |i'=(x) -i%{x)\ < 1/VN. 

(3) All systems X G X{d, N) with ^[^J^ = x satisfy: 
^'^^^ - t%{x) - \\hx - /i:||c,7v < t%{x). (3.8) 



Nhgd 

(4) If G X{d, N) and limAr^oo = x, then 



Nlogd 

limsup —^^ f < f (x). (3.9) 
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Observe that to show that i'^{x) is indeed the value of the hmit (13.91) . rather than 
a mere upper bound, requires to prove the existence of sequences saturating the 
inequahty. This is deferred to the next section. Before proving Proposition 13. 3[ we 
need some prehminary material which will also be useful later. 



3.3. The functions i%{x) and i'^{x). We consider the first two points of the propo- 
sition, beginning with the convergence of i%{x) i'^{x). 

Lemma 3.4. For all x G [0, 1] 

\z%{x)-f{x)\<^, N>1. 



Proof. We use the probabilistic representations ( 13.60 and ( 13. Sp and we obtain 



Since Dj^ = NP^ is, conditionally on Wc, a binomial variable with parameters 
{N,Wc), we have that 

E{\Pn- W^f) = E (Var | W,)) = E (^^^^iiizi^^ < _L (3.10) 

and the result is proven. □ 

Now, we analyze the limit function i'^[x). 
Lemma 3.5. 

(1) i^ix) = E(min{x, l-x,Wc,l- W^}) for all x e [0, 1]. 

(2) The function i'^ : [0, 1] 1— > [0, 1] is 1-Lipschitz and concave. The distributional 
second derivative of i^ is — 2Ac. 

(3) i^{x) = i^il - x) for all x G [0, 1]. 

(4) i^ achieves its maximum at x = 1/2 and i'^{l/2) = E{Wc A (1 — Wc)). 

(5) i'^ is maximum only at x = 1/2 if and only if 1/2 belongs to the support of 

Proof. First, for all x, a G [0,1] 

xAa + xA{l — a)— x = min{x, 1 — x, a, 1 — a}. (3-11) 

Indeed, one can assume a < 1 — a and then check the above in the three cases: 
x<a, a<x<l — a and x > 1 — a. Since Wc has same law as 1 — Wc, point (1) 
and (3) follow. 

Concavity, 1-Lipschitz continuity and symmetry w.r.t. 1/2 follow easily. More- 
over, an integration by parts shows that for all (f G C°°(R) with compact support 
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contained in (0, 1): 



[0,1 

2 



[0,1] 



(p"{x) X At dx 



[0,1] 



Xcidt) 



'^"{x) X dx 



[0,1] 



[0,1] 



(p"{x)xdx+ / (p"{x)tdx 



Xc{dt) - 



[^'{t)t - ip{t) - tcp'it)] X,{dt) = -2 / ip{t) X,{dt), 

[0,1] J[0,l] 

proving that {d/dx^i'^ = —2Xc as distributions. Point (2) is proved. 

Since i'^ is concave and i'^{x) = — x) then i'^{x) < i'^{l/2) = K{Wc A (1 — Wc)) 
for all X G [0, 1]. Point (4) is proved. 

Let us now assume x < 1/2 (the other case being similar) so that x = x A{1 — x). 
Set w = WcA{l-Wc). Then, by (Km 

f (1/2) - f{x) =E{w - X Aw) =E{{w - x) 1^<^)) . 

Hence, i'^{x) < i'^{l/2) if and only if P(a; < w) > 0, i.e. 1/2 is the unique maximum 
point if and only if Ac(]a;, 1 — x[) > for all x < 1/2. This proves the last point. □ 

3.4. Intricacy as a function of the profile. Let us set 

r := {h : [0, 1] (-^ [0, 1] : h{0) = 0, t \—>- h{t) is non-decreasing and 1-Lipschitz} 

and for any real number x G [0, 1] 

V^:={heV : h{l) = x} . 

These sets are endowed with the partial order: h < g ii and only if h{t) < g{t) for 
all t G [0,1]. Each F^; has a unique maximal element: the previously introduced 
ideal entropy profile, h*{t) = t A x. 

Lemma 3.6. For any X G X{d,N), the entropy profile hx, defined according to 
Def. 1.4, belongs to T. 

Proof. Let X G X{d, N). Setting / := {1, . . . , A^} and 

k = 0,l,...,N 



ScI,\S\=k 

we must prove that 

= Ho < Hi < ■ ■ ■ < Hn = H(X), Hk+i -Hk<\ogd, < k < N. 
The equalities Hq = and H^ = H(X) are obvious. Let < A; < and compute: 



H 



k+l 



■jvT E ^(^^) = 7^ Yl E H(^5u{i}) 

£+1/ |SI=A:+1 \k+l) \S\=k ieS'' 



\k+l/ |S|=A:+1 \k+l/ 

< -^-TT^ E - ^)(H(^s) + logrf) = H, + \ogd 



\S\=k 



where H(X5u{i}) < ^{Xs) + H(Xi) by (Ol) and H(Xi) < logrf by flAll) . The same 
computation, since }l{Xs\j{-i}) > Y{{Xs) by ( 1A.2I) . proves H^ < Hk+i- D 
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Let for any h eT 

JL ..//v\ 

h{k/N)-h{l) = 2E{h{pN))-h{l). 



/ 

fc=0 ^ 



Lemma 3.7. Fix x G [0, 1]. 

(1) For all X G ^(d, N) G%{hx) = 

(2) hi is the unique maximizer of C]^ in and Cj^i^hl) = i%{x). 

(3) For arbitrary h G T^, we have 

\\h - K\Um = \GUh) - C^^ihDl (3.12) 

Proof. Since MI(X,y) = H(X) + H(r) - H(X,y), 4 = 4^, and ^^4 = 1, we 
obtain 

N 

fc=0 \S\=k 

Hence, the intricacy can be computed from the entropy profile: 

= sEcf - /.x(l) = GUhx) (3.13) 

and (1) is proved. 

A direct computation yields for arbitrary h eTx'- 

N 



GUK) - GUh)\ = GUK) - G%{h) = 2$^cf ( ^ ) mk/N) - h{k/N)) 

k=0 



'2j2^kU)\Kik/N)-h{k/N)\ = \\h:-h\U,r,, 



since each term is non-negative. This proves (3). 

Observe that G% : F^^ — M is monotone non-decreasing. Hence, setting x = ^[^J^ 
and recalling (13.51) 

T{X) = G%{hx) < sup G%{h) = G%{hl) =2E{xAPn)-x = i%{x). 

Moreover, 2'^ being non-null, all 4 positive, G% is increasing and /i* is a max- 
imizer. Uniqueness of the maximizer in Fx follows from (I3.12p . and point (2) is 
proved. □ 

3.5. Proof of Proposition [3731 Formula (13. 8p follows from Lemma [3.71 since for 

H(X) 



X 



Nlogd 



G%{hx) = G%{h:) + G%{hx) - G%{hi) = 4(x) -\\hx- 



Nlogd 

To prove (13.91) . it is enough to use (13. 8p . together with the continuity of i'^ and the 
uniform convergence of — > i'^. Proposition 13.3! is proved. 
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3.6. No system with the ideal profile. We turn to the problem of maximizing 
X'^ over X[d^ N) at fixed for a prescribed value of the entropy H(X). The above 
results show that a system X G X{d,N) such that hx{k/N) = hl{k/N) {k = 
0,1,..., A^) with X = j^j^ would be an exact maximizer. However, the next Lemma 
shows that such X cannot exist except if K or N — K are bounded, independently 
of A^. Thus, all we can hope is to find systems which approach the ideal profile. 
This will be done in section |H 

Lemma 3.8. For each d > 2, there exists H^, = H^{d) < oo with the following 
property. If N > 1 and Yi, . . . , Y/v ore random variables taking values in {0, . . . ,d — 
1} and defined on the same probability space such that, for some real number H e 
[0,N], 

H(ya(l), . . . , Fa(fc)) , . Tj w ^ C W/ 1 AT 

— ; = k AH, V a G S^, V/c = 1,. . . ,N, 

log a 

then H or N - H < H^. 

Proof. Let K : = \_H\ and K : = IH] . Without loss of generality, we assume 
that K > 3 and we proceed by contradiction. Let us condition on the variables 
(X3, . . . ,Xj^) (in the following paragraphs we simply write "conditional" for "con- 
ditional on (X3, . . . , Xj^)). By assumption: 

• (Xi, X2) belongs to Z := {0, . . . , d - l}^; 

• each Xi, K < i < N, is a. function of Xi,X2 as the conditional entropy of 
{Xi, X2, Xi) is not bigger than that of {Xi,X2). Moreover, the conditional 
entropy of Xi is logd. Hence, each such Xi defines a partition Zi of Z into 
d subsets. 

• For any pair i ^ j in {1, 2, K + 1, . . . , N}, {Xi, Xj) has conditional entropy 
{H ~ K + 2) log d, strictly greater than that of Xi or Xj, both equal to log d. 
In particular, Zi 7^ Zj. 

Thus, we have an injection from {1,2, K + 1, . . . , N} into the set of partitions of Z 
into d subsets. This implies: 

'd'^ + d- V 



d-1 

(d'^+d~l\ 



N-K+2< 

Thus N-H <H,{d):= f^:";') ■ □ 



4. Random Construction of approximate maximizers 
Motivated by fl3.9p . we introduce the following 

Definition 4.1. Let X'^ be some intricacy and let x G [0, 1] and d > 2. 
(1) r/ie entropy-intricacy function X'^((i, x) is: 

supjlimsup?^: ^^^eM{d,N), lim = xj • (^-l) 

[ AT-^+oo N log d 7V->+oo Niogd J 
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(2) A sequence of systems G X{d,N), N > 2, is an approximate x- 
maximizer for X'^ if 

lim — ; = X and lim — ; =2 (d, x). 

Af^oo log d N-*oo Nhgd 

(3) {X^)]y is an approximate maximizer forl^ if 

I'^(X^) 

lim — — -= max X'^fd, x). 

N^oo Nhgd xe[o,i] 

Proposition 13.31 established that T'^{d,x) < i^{x). Proposition 14.31 in this section 
shows that this inequality is in fact an equality. 

In the rest of this section we construct approximate x-maximizers by choosing 
uniform distributions on random supports with the appropriate size: since ^^j^ 
must be close to x and is uniform, then the size of the (random) support of 
must be close to d^, see ( lA.ip . It turns out that this simple construction yields the 
desired results. 



— X 



< 5 



Remark 4.2. In [2] we have given a different definition of T'^((i, x), namely 

H(X) 
A^logd 

for any sequence {5n)n of non-negative numbers converging to zero. It is easy to 
see that this definition and (14. ip actually coincide. 



Tid, x) := lim } , sup <* TiX) : X E X(d, N), 
N^oo N log d 



N 



4.1. Sparse random configurations. Let N > 2 and < M < A^ be integers. 
We denote 

Ad,„:={0,...,d-ir, Vn>l. 

We consider a family (VFj)jgAdM i-i-d. variables, each uniformly distributed on 
Arf^AT, defined on a probability space (f2,jF, P). We define a random probability 
measure on Ad^N 

,^^'^\x) := d-'' l(-=^0' ^(^^d,N. (4.2) 

In what follows we consider random variables X^'^^ on (i7, JF, P) such that 

P (X^'^^ = X I (iy.).eA.,M) = /^^'*'(^), ^ e A,,^. (4.3) 

In other words, 

conditionally on (Wj)jgAdM' X^'^ has law jj,^'^ . 
We are going to prove the following 

Proposition 4.3. For integers N > 1, < M < N , let X'^'^ be the random 
systems defined above. Let x G [0, 1]. For any intricacy I'^ we have, a.s. and in L} 

lim , ' = I'ix) (4.4) 

iV-.+oo A^logd 
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and 

lim J =x. (4.5) 

Remark 4.4. We stress that in the following we denote 

r(X^'^0 = ^"if^'''^'), H(X^'^0 = H(^^'^) (4.6) 

and that all these expressions are random variables which depend on (VFj)jgAd j,^ • In 
other words, (14.61) indicates entropy and intricacy of the law of X^'*^ conditionally 
on (IVj)igAtjM- This abuse of notation seems necessary, to keep notation reasonably 
readable. See also Remark [4.61 below. 

4.2. Average intricacy of sparse random configurations. We recall that > 
2, M is an integer between 1 and N and Sn denotes the set of permutations of 
{1, . . . ,iV}. By Lemma EZl X^(X^'^0 = '^Eto^k O^xik/N) - hxil), hence we 
get: 

N / \ 

J. (i.,.^«,M)) = 2. 2 (^) E (H(.Yf;5 - E (H(.Y-.")) . 

cre<Sjv k=l ^ ^ 

(4.7) 

We are going to simplify this expression by exploiting the symmetries of our con- 
struction. 

Lemma 4.5. The random vector X'^'^^ = {X^'^ , . . . , X^'^^) G X{d,N) is ex- 
changeable, i.e. for all a G Sn and any $ : A^^n i— M 

E ($(xi:;f/, . . . , x^^^)) = E [Hxr, . • • , xr': 

Proof. Note that every cr g Sjy induces a permutation Eg- : ^ ^d,N 

E^(Xi, ...,Xn) = {Xa{l), • • • , X„(^N)), X G Ad,Ar. 

In particular, {X^^-^ , . . . ,X_^^j) = So-(X^'^^) has, conditionally on (M/j)jgAdM' "dis- 
tribution 



However, (So-(PVj))igAd ^ has the same distribution as (lyj)jgAdM- Therefore we 
conclude. □ 

Remark 4.6. Notice that E>^'^^ has same law as /x ' , but in general the two 
measures are not a.s. equal. In other words, (X^ ' , . . . , X^'^'^) is exchangeable but 
not exchangeable conditionally on (W^j)jGAdAf 

In particular, for all A; G {1, ... , A^} and a G Sn 

E (H(Xf;(- ...,^(,)^)) = E (h(X-;-,p) , (4.8) 
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and we obtain by fl4.7p 

E (J^(X^'^O) = 2 E (I) ^ - E (H(X^'^O) • (4.9) 

Lemma 4.7. Lei y G A^^ fc, A; G {1, . . . , A^} and set 

uiy):= Yl /^^'*'(?/,^)- (4-10) 

Then rf*^ ■ iy{y) is a binomial variable with parameters {d'^^,d 

Proof. Notice that, conditionally on {Wi)iiz[^^ j^^, X^'^^^ = [X^'^^ , . . . ,X^'^) G 
Kd,k has distribution 

where y G A^^a: and {y, z) G Ad,fc x A^^a^.^. = A^.at. For fixed y G A^^^, the family 

Ti := ^ l((j,^2)=iy.), i G Arf^M 

is an i.i.d. family of Bernoulli variables with parameter d~^. Indeed, if H-N^k '■ 
^d,N ^ ^d,k is the natural projection, then the law of YlN^kiWi) is uniform on Ad^k, 
so that 

P(T, = 1) = FiU^^km) = y) = d~\ 
Hence, for all y G A^^fc, d'^^ ■ u^y) = XlieAdA/-^* independent 
Bernoulli variables with parameter d~'', i.e. a binomial variable with parameters 

Let us denote from now on by Bk a binomial variable with parameters {d'^,d~''). 
Set 

logd 

Notice that the function ipi^x) := —(1 + x) log(l + x) + x + ^ satisfies 

^(0) = ip'{0) = 0, ip"{x) > 0, Vx > 0, 

so that tplx) > for x > 0. Moreover, ip{l + x) > if x G [—1,0]. Hence, for all 
X > -1, 

^(l + x)>-l(„>o)^^^. (4.11) 

Now, by KWf 

=- E Kz/)iogKy), 

then we obtain by Lemma 14.71 that 

^= 1^^ = (V^ • (4-12) 
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Lemma 4.8. We have, for any < k < M, 

hk = k + E {if{Bk =M + d""-^ E {v{Bk)) ■ 

Proof. These identities follow from the formulae: ]E(i?fc) = d^^~^, ip[d^^) = jd~^ and 
(f{ax) = a(f{x) + xip{a) for a > applied to (f{Bkd''~^^ ■ d~'^) and '^{Bk ■ d"^^). □ 

Lemma 4.9. Away from k = M, the entropy is nearly constant: 

k ~ M 

k - 2d— <hk<k, = 1, . . . , M, 

M - d*^-'^ <hk<M, k = M +1,...,N. 

Proof. The upper bounds are easy. Indeed, for k < M one uses (lA.Sp . while for 
k > M we notice that the support of ^^''^^ has cardinality at most c/*^, and apply 
(lA.ip to conclude. 

Recall that B^ is binomial with parameters {d^,d~^). Then E(i?fc) = d^^^'^' and 
Var(5fc) = d^"d-^{l - d'^). If we define Jk := Bk ■ d^'^^ - 1 then we obtain 

E {.Jl) = ti^C^-A^) Var(Bfc) = d'-^' - rf"^ < d'-""' . 

Hence, using (14. lip we get 

E {ifiBk ■ d'-'')) = E(^(l + Jk)) > -^E (^l(j,>o) (^Jk + 

> -E (I J,| + 4) > - (yt(J^ + E (Jl)^ > -2rf^, 

since E(| J/d) < a/E( J|) by Cauchy-Schwartz and rf^^ < d''"^^ < 1. By Lemma 
14.81 we obtain the desired lower bound for k < M. 
Let us consider now the regime k > M. We have 

hk = d" E{ip{Bk d-'')) =M + d'-^' E {^{Bk)) . 

If Bk e {0, 1} then (p{Bk) = 0. Note that fl4l4p implies that (p{Bk) > -{Bk - 
1)^ — {Bk — 1) as the right hand side is zero whenever Bk = 0, 1 and less than 
-{Bk - 1)72 - {Bk - 1) otherwise. Thus, 

d'-^'E {^{Bk)) > -d'-^'E {{Bk - 1)' + {Bk - 1)) = -d^'-^E {Bl - Bk) 

> -d^'-'' + d"^ > -d^'-''. 
By Lemma 14.81 we obtain the lower bound for k > M. □ 

4.3. Estimation of the expected Intricacy. 

Lemma 4.10. Let x g]0, 1[, M := [xN\ and a := d^/^ > 1. For all N >2, using 
the notation (13. ip . 



E (l^( X^'^)) 

- 2d-^^-^'^^ <E{2 {Dn a M) - M) , < E (4a-l^^-^^l) . (4.13) 

log d 
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Proof. By (gS]), (1^1) and fl^ . 



k=i 

So, 

E (J'=(X^'^)) 



I.— -I \ / 



- E{2{Dn a M) - M) = 2E{hDj^ - A M) + M - h 



N 



logd 

We conclude by Lemma I^l9l □ 



Lemma 4.11. Let x G]0,1[ and M := lxN\. Then for a := d^/^ > 1 and 



some 



constant C > we have 



C 

E < -—, WN > 1. (4.14) 



Proof. To ease notation, in this proof we drop the subscript c from Wc- By fl2.2p 
and (13.21) . we have that 

HDn = k)= {^^ = {^^ E{W''{1- Wf-^) , k = 0,...,N. (4.15) 

We claim that for all 1 < A; < [yj we have 

F{Dn = k-l) < F{Dn = k). 

Indeed, 

F{Dn = k)- F{Dn = k-l) = 

— — E {W''-^ (1 - Wf-'' [{N + 1- k)W - k{l - W)]) . 



k\{N -k + 

= kliN-k + iy. ^ - + - • 

By the symmetry of Ac w.r.t. 1/2 we have, since 2A; < + 1, 

E {W''-^ (1 - Wf-'' [{N + 1)W - k]) = 2-^+1 (^^^ - = + 

+ E {W^'^ (1 - Wf-'' [{N + l)W - k] l(H^>i/2)) + 
+ E (1 - W)'-' [{N + 1)(1 - W^) - k] l(H/>i/2)) 

>E(^{W{l-Wyf-'l^w>i/2y 

■ [{{N + 1)W - k) + ((iV + 1)(1 - H^) - k) (1 - W^)^-2^+i] ) . 

This expectation is nonnegative. Indeed, the first term is nonnegative as 2A; < A^ + 1 
and W > 1/2. We assume the second term to be negative, otherwise we are done. 
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As 1 - < VT, we get: 

r{DN = k)- F{Dn = k-l) = 

= E [lw>i/2iiN +l)W-k+{N + l){l-W)- k){l - 1^)^-2*^+1] 
> E [lw>i/2{{N + 1) - 2A;)iy^-2'=+i] > 0, 
proving the claim. 

We set L := [f J. Then N - L > L > f - 1 and by (KTBf we obtain for all 



0< k< N 



L\{N -L)\ 



E (ly^ (1 - ly)^-^) 



< 



L\{N -L) 



L\{N-L) 



< 



-N+2 



L\{N -L)\ 



By Stirling's formula n! = ^/2^Tn(n/e)'^{l + 0{l/n)), there is a constant C > such 
that 



-TV 



< CA^-2. 



L\{N -L) 

Then, we obtain for some constants Ci, C2 > 

N 



N +00, L 



N 
~2 



fc=0 ^ ^ 



N 



+00 



a 



-|fc-M| 



k=0 



< CiAr-5 2^ 



a 



k=0 



and the proof is finished. 



□ 



Proof of Proposition \4 . 3[ Let x G ]0, 1[ and M := [xA^J > 1 (A^ is large). By Lemma 
mifor k = N 



E 



A^ Nlogd 



E 

■ N Nlogd 

jM-N 



E 



M-h 



N 



Thus, 



therefore a.s. 



and in particular a.s. 



N>1 

E 

iV>l 



N 

M H(X^'*^) 



A^ A^logrf 



< +00, 



(4.16) 



lim 



N Nlogd 



< +00, 



N Nlogd 



0. 
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Setting xn '■= ^^lo ' d i have obtained 



lim Xn = iim — -7 — = x 

a.s. and in L^, namely we have proven ( ]4.5p . Now, let us set observe that, by 
Proposition 13.31 this gives 

again a.s. and in L^. On the other hand, by fl3.5l) and by Lemmas 14.101 14.111 

JV>1 ^ & / ^r^i 



N>1 



Arguing as above, it follows that ^i^^^ ^ — * i'^{x) a.s. and in L^. This proves (14. 4p 
and concludes the proof of Proposition 14.31 □ 

5. Results for Arbitrary Intricacies 

We now collect our results to state the generalizations of Theorems 11.11 and 11.51 
for arbitrary intricacies. We consider some non-null intricacy X'^. Let Ac be the 
associated probability measure on [0, 1] according to Proposition 12.21 Recall from 
Def. 14. II that the corresponding entropy-intricacy function I'^{d,x) is: 

r(d,x) := sup \ lim sup — \ f : G X(d,N) s.t. lim = x 

^ ^ \ jv^oo A^logd ^ ^ N^ocNlogd 

We also recall that i'^{x) and i^i^) have been defined in eq. (13.51) and (13. 6p . 

Theorem 5.1. 

(1) For any N >1, X e X{d,N), 



I%X) ^ 



u—n \ / 



Nlogd 

° fc=0 



{k/N) A X - x. 



(2) T^{d,x) = i'^{x) = 2 X A t Xc{dt) — x. The function i'^ is Lipschitz with 
constant 1, concave and symmetric: i'^{l — x) = i'^{x). 

(3) i'^{l/2) = max^g[o,i] i'^{x). Moreover 1/2 is the unique maximum if and only 
if 1/2 belongs to the support of Xc- 

(4) \i%x) -i%ix)\ < N-^l^ for all x e [0,1]. 

Theorem 15.11 immediately follows from Propositions 13.31 and 14.31 We now consider 
the convergence of entropy profiles to the ideal profiles for approximate maximizers, 
i.e., the generalization of Theorem 11.51 

Theorem 5.2. Letl'^ he an intricacy. 

(1) If 1/2 is in the support of Xc, then any approximate maximizer {X'^)^ for 
X'^ satisfies: 

}i(X^) 1 

lim — ; 7 = -, lim sup l/iviv — /J = 0. (5.1) 
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(2) Let X E [0, 1] and let {X )]\f be an approximate x-maximizer for I'^ . Then: 

lim sup \hxN — hll =0. (5.2) 

In particular, if x E supp(Ac) then 

lim sup \hxN — hl\ = 0. (5.3) 

N^OQ [0,1] 

(3) If X G supp(Ac) then an approximate x-maximizer (X^)jv for X'^ is an ap- 
proximate x-maximizer for any other intricacy I'^ . 

Remark 5.3. The extra assumption about the support of Ac cannot be dropped. 
Indeed, for point (1) of Theorem, observe that, in the case of the p-symmetric 
intricacy, 1/2 is not in the support of |(5p + 5i_p) for p 7^ 1/2 and approximate 
maximizers of satisfy only 

H(X^) , H(X^) 
p < lim inf — — < lim sup — — < 1 — n. 

^- N Nlogd- N Nlogd- 

The entropy may accumulate on any point on the interval [p, 1 — p]. 

Notice however that for many intricacies, including the neural complexly, the 
support of Ac is the whole interval, making this assumption satisfied for all x G [0, 1]. 

Remark 5.4. In the setting of point (2) of Theorem 15.21 if x does not belong to 
the support of Ac, then one can prove with similar arguments that 

lim sup \hxN — = (5.4) 

[0,a]U[b,l] 

where a := sup([0,a;] fl supp(Ac)), b := inf ([a;, 1] fl supp(Ac)), with the convention 
sup := and inf := 1. 

Remark 5.5. Let X^'^^ the random system constructed in (14.21) and (14. 3p . with 
M := [xA^J, X g]0, 1[. For the Edelman-Sporns-Tononi intricacy X we have that 
the support of the associated probability measure A is [0, 1], since it is the Lebesgue 
measure by Lemma 3.8 of [2]. Since {X^'^^)i^ is a.s. an approximate x-maximizer 
for X by Proposition 14.31 then by point (3) of Theorem 15.21 a.s. this sequence is an 
approximate x-maximizer simultaneously for all intricacies X'^. 

This has the following consequence for approximate maximizers (i.e., without 
entropy constraints). An approximate maximizer for some intricacy X^ where 1/2 ^ 
supp(Ac) is not necessarily an approximate maximizer for another intricacy. But 
an approximate 1/2-maximizer for any intricacy is automatically an approximate 
maximizer for all intricacies. 

Proof of Theorem \5.2[ Let us set for simplicity of notation: 

H(X^) ^ X^(X^) 
log d A^ log d 

If 1/2 is in the support of Ac, then it is the unique point where i'^{x) achieves its 
maximum. Then, Theorem 15 . 1 1 implies that no x 7^ 1/2 can be an accumulation point 
of xn,N > 1. Thus an approximate maximizer is an approximate 1/2-maximizer. 
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It is now enough to prove point (2). By definition of approximate x-maximizers, 
xat — > a; and X/v i^{x). Using point (2) of Proposition I3.3[ it follows that — 
^Ar(^)l ~^ S'^d by (13. 8p we have \\hxN — ^*||c,Ar — >■ 0. Notice now that for any 
X-Lipschitz function / : [0, 1] ^ M, by flXTOj) 

\nf{PN))-nfm)\< ^ 



/N 

As entropy profiles are 1-Lipschitz, we obtain 

hxN — /i* I dXc 0. 

As all functions [hxN — h*)^ are 2-Lipschitz, fl5.2p follows by a routine argument. 

Assuming now x G supp(Ac), limN^oohxN^x) = hl{x) = x. On the one hand, as 
hxN{0) = and hxN is 1-Lipschitz, it follows that the convergence lim^r^oo hx^it) = 
hl(t) = t occurs for all t G [0,x]. On the other hand, all hx^ being non-decreasing, 
hxN^x) = X < hx^it) < hxN^l) — i> X. Hence the previous convergence occurs for 
all X G [0, 1], proving (15.31) . 

Finally, let us prove point (3). By (15. 3p the profiles hx^ converge to uniformly 
on [0, 1]. Let X'^ be any other intricacy. By uniform convergence we have \\hxN — 

lie', AT < supjQ ]^] \hxN — hl\ — >• 0. By (13. 8p and Lemma [3^ 

— ^ = - \\hx - /i;||c',7v ^ i^'ix), N +00. 
N log (2 

By Theorem 15.11 i^' (x) = X'^' (d, x) and therefore (X^) n is an approximate x- 

maximizer for X'^ . □ 

We have the following consequence for approximate x-maximizers. We recall that 
H(y I Z) denotes the conditional entropy, see the Appendix. 

Theorem 5.6. Suppose that x G supp(Ac) and let {X'^)n be an approximate x- 
maximizer for some < x < 1. Then 

(1) Ify g]0,x] then for all e > 
1 



hm ^^-^^^Sc{l,...,N}:\S\ = [yN\.^ 

\lyN]) 



(1 - e)\S\ \ogd < HiXg) < \S\ \ogdj = 1. 

(2) Ifye [x,l[ then for all e > 

hm j^nS C{1,...,N}:\S\= [yN\ , H(X^ | X^) < exNlogd} = 1. 

\lyN\) 



N-* + OD 



This result can be loosely interpreted as follows: as — +oo, 

(1) if y g]0,x] then for almost all subsets S with 15*1 = [yN\, Xs is almost 
uniform; 
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(2) if y G [x, 1[ then for almost all subsets S with IS*! = [yN\, X is almost a 
function of Xs- 

This follows from the relation between entropy and conditional entropy on one side 
and independence versus dependence on the other side, see the Appendix. 

Proof of Theorem \5.6[ Let y e ]0, 1[. By fl5.3p . hx^iy) h*x{y) = x A y a.s N —> 
+00. By the definition 11.41 of hxN, we obtain, setting k^- := [yN\, 



n' |5|=fcjv 



K{y) 



N\ogd 



(^) , ^ N\ogd 



h:{y)-hxN{y)^0, 



since all terms in the sum are non- negative by Lemma 13.61 Let Zn, defined on 
[Q, JF, P), be a random subset of {1, ... , N} defined by 



if 151 



Then the above formula can be rewritten as follows 



lim E 



Kiy) 



Cat. 



0. 



Nhgd 

Since convergence implies convergence in probability, we obtain 



lim P 



Kiy) - 



Nlogd 



> ^Kiy) 



0, 



\fe > 0. 



This readily implies the Theorem, by recalling that H(X | Xg) = H(X ) — H(X 
and that — > a; by assumption. 



□ 



Appendix A. Entropy 

In this Appendix, we recall needed facts from basic information theory. The main 
object is the entropy functional which may be said to quantify the randomness of a 
random variable. We refer to [3| for more background. 

Let X be a random variable taking values in a finite space E. We define the 
entropy of X 

H(X) := -J^Pxix) \og{Px{x)), Px{x) := P(X = x), 

where we adopt the convention 

■ log(O) = ■ log(+oo) = 0. 

We recall that 

< H(X) < log|E|, (A.l) 

More precisely, H(X) is minimal iff X is a constant, it is maximal iff X is uniform 
over E. To prove (lA.ip . just notice that since > and f{x) = if and only if 
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X e {0, 1}, and by strict convexity of x i-^ </?(x) = a; log a; and Jensen's inequality 
log \E\ - H{X) = 5^ Px{x) \E\ {\og{Px{x)) + log \E\) 




with log \ E\ — H{X) = if and only if Px{x) \E\is constant m x & E. 

If we have a £'-valued random variable X and a F-valued random variable Y 
defined on the same probability space, with E and F finite, we can consider the 
vector {X, y) as a x F- valued random variable The entropy of {X, Y) is then 

H(X,F) ■.= -Y,Pix,Y){x,y) \og{P^x,Y){x,y)), Pix,Y){x,y) ■.= ^{X = x,Y = y). 

x,y 

This entropy H(X, Y) does not only depends on the (separate) laws of X and Y 
but on the extent to which the "randomness of the two variables is shared". The 
following notions formalize this idea. 

A.l. Conditional Entropy. The conditional entropy of X given Y is: 

H(X|r) := H(X,F) -H(r). 

We first claim that it is nonnegative. 

Remark that Px{x) and Pyiy), defined in the obvious way, are the marginal laws 
oi P(x,Y){x,y), i.e. 

y X 
In particular, Px{x) > P(^x,Y){x,y) for all x,y. Therefore 

E^(x,>'.(-..)log(^f|^)<0 

x,y ^ -'^V / / 

which yields 

H(X,y) = -^P(x,y)(x,y) logP(x,y)(x,y) > -^Px(x) \ogPx{x) = H(X), 

x,y X 

i.e., H(X|y) > 0, proving the claim. Therefore 

H(X, Y) > max{H(X), H(y)}. (A.2) 

Moreover H(X, F) = H(X), i.e. H(r|X) = 0, if and only if P^x,Y){x,y) = Px{x) 
whenever P(x,y)(a;, y) 7^ 0, which means that y is a function of X. 
On the other hand, 

R(X, Y) < R(X) + H(y) (A.3) 
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with equality, i.e., H(y|X) = H(y), if and only if X and Y are independent. This 
can be shown by considering the KuUback-Leibler divergence or relative entropy: 

Since log(-) is concave, by Jensen's inequality 

-/<log('^P,.,v-,(x,,)^MfM] =log(5:P,(.)P,to)) =0. 

V x,y ■•yij \x,y j 

By strict concavity, / = if and only if P(x,y) (x, = Pviy) for all x,y, i.e., 

whenever X and Y are independent. 

By the above considerations, H(X | Y) G [0, H(X)] is a measure of the uncertainty 
associated with X if y is known. It is minimal iff X is a function of Y and it maximal 
iff X and Y are independent. 

A. 2. Mutual Information. Finally, we recall the notion of mutual information 
between two random variables X and Y defined on the same probability space: 

MI(X, Y) := H(X) + H(F) - H(X, Y) 

= H(X) - H(X I Y) = H(r) - H(F | X) 

= '^Pix,Y){x,y) log 
x,y 

This quantity is a measure of the common randomness of X and Y. By flA.2l) and 
flX3|) we have MI(X, F) e [0, min{H(X), H(r)}]. MI(X,F) is minimal (zero) iff 
X, Y are independent and maximal, i.e. equal to min{H(X), H(y)}, iff one variable 
is a function of the other. 
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